Collection, spreadsheet design and digitizing
Data creation is the systematic collection of data for a specific purpose and digitizing the data for downstream processing in statistical analysis or sharing with or reuse by others.
| Time | Task |
|---|---|
| 5 min | Introduction |
| 15 min | Design method |
| 10 min | Discuss your decisions |
| 30 min | Go outside and collect data |
| 30 min | Discussion on data collection and digitizing |
automatic data collection (e.g. weather station, LiCor)
manual data collection (samples, measure, count)
Make groups of 4-5 students.
Decide what data you want to collect. (e.g. snow depth, length of icicles, plant height)
Decide on a method to collect the data (e.g. paper, phone).
Design a spreadsheet/protocol. Think about what is the relevant information that you need.
Reflect on the decisions you made and if you would change anything if you had the means.
Go outside and collect your data. It is not important to collect as many data points as possible.
Reflect if the method you used was suitable for the data you collected.
How did it go?
was the method appropriate?
Did you miss any information?
Logistical issues
Calibration of instruments
Multiple measurements/observations/samples
Template/protocol for sampling (multiple data collectors, over time)
Take notes during data collection
Collect meta data that could be useful for wider usage
Source: britishecologicalsociety.org/publications/guides-to
What would be your strategy for digitize the data?
What could be problems?
Use data validation tools for data entry.
format cells (dates)
set ranges
drop down menu
file names
variable names
factor levels
missing data
notes
Digitize data
Keep raw data raw
No calculations in raw data
Code-based data cleaning
Clean data
Document your data (data about your data)
BES guide for Data management
BioStats book Data collection
Broman, Karl W, and Kara H Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10.
Open, reproducible, and transparent science course